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ABSTRACT 

The Torrance Tests of Creative Thinking (TTCT) may 
represent a breakthrough in the area of creativity research, since 
they provide a functional instrument for measuring creative potential 
in children, adolescents, and adults. However, there are certain 
technical problems with achieving interscorer reliability which may 
act as a deterrent to the widespread use of the tests, A thorough 
review of the literature has revealed that computerized content 
analysis has not been used for scoring creativity tests, although it 
appears to be an appropriate approach. Therefore, a study was devised 
to develop strategies appropriate to computer scoring of the TTCT, to 
determine the effectiveness of actuarial measures in the prediction 
of scores, and to make some initial explorations regarding the 
appropriateness of the norms developed by Torrance for scoring the 
tests. Responses of 153 subjects to Verbal Form A of the tests were 
reliably rated by four trained judges, and multiple correlation 
analyses were computed. The regression equations generated through 
this process were shown to have high predictive power, and 
examination of the important predictors suggested that the TTCT has a 
single underlying dimension of verbal fluency. Category count 
variables were useful in the prediction process. (Author/SH) 
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ABSTRACT 



The purposes of this project were 1) to develop strategies 
appropriate to computer scoring the Torrance Tests of Creative Think- 
ing, 2) to determine the effectiveness of actuarial measures in the 
prediction of scores assigned to these tests and 3) to make initial 
explorations regarding the appropriateness of the norms developed 
by Torrance for scoring these tests. A review of the literature showed 
that no previous attempt has been made to apply a computerized content 
analytic technique to the scoring of creativity tests but that such 
an attempt is appropriate. Responses of 153 subjects to Verbal Form 
A were reliably rated by four trained judges. Multiple correlation 
analyses involving predictor variables (including "variables of oppor- 
tunity" and variables based on the Torrance norms) and the criterion 
variables of Fluency, Flexibility and Originality for the various 
activities were computed, and regr*ession equations were generated. 
Through cross-validation, these equations were shotra to have high pre- 
dictive power. Examination of the nature of the important predictors 
suggested the possibility that the TTCT has a single underlying di- 
mension - verbal fluency. Category count variables based on Flexi- 
bility and Originality dictionaries were useful in the prediction 
process . 
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PREFACE 



The findings reported herein are the result of a cooperative 
effort on the part of several persons who have made direct and in- 
direct contributions to this study. Because of the longstanding 
interest in natural language computing at The University of Connect- 
icut the writers have been able to draw upon many resources and the 
research competencies of a number of persons who have worked in this 
area of study. The work of these persons is cited in the report 
that follows. We would like to esqpress our sincere appreciation to 
each of the individuals whose previous and sometimes pioneering work 
in the area of natural language computing has helped to make our 
work somewhat more manageable. 

We are especially indebted to two former University of Connecti- 
cut researchers who have made direct contributions to the study. Dr. 
John F. Greene of the University of Bridgeport was responsible for 
developing computer programs and processing a major segment of the 
data related to the study. Chapter V of this report was prepared by 
Dr. Greene and his work also is reflected in the discussion of find- 
ings that is reported in Chapter VI. Dr. Gerald A. Fisher, presently 
at the Illinois Institute of Technology, was responsible for develop- 
ing the SCORTXT program which constituted the major programming system 
used in the study. 

The writers are indebted to Dr. E. Paul Torrance of the Uni- 
versily of Georgia and Dr. Robert N. Walker of Personnel Press, Inc. 
who assisted us in many ways throughout the course of the project 
and who provided us with constant encouragement in this relatively 
new area of research endeavor. 

We are grateful to Carolyn Callahan who supervised the scoring 
of the Torrance Tests of Creative Thinking and to the following 
persons who served as test scorers on the project: Margaret Capellini, 

Tobi Okun Cannelli, Jane Goodman, Michael Lunney, Christine deEenne 
Stephen, and Sandra Wever-Bey. 

We are indebted to a number of persons who generously provided 
us with data for a pilot study that paved the way for the present 
investigation. These persons include: Dr. John R. Stubbings of the 

Alexandria, Virginia Public Schools, Dr. Donald J. Tref finger of 
Purdue University, Dr. Richard E. Ripple of Cornell University, and 
Dr. John S. Dacey of Boston College. We also are indebted to Dr. 

Hugh Clark, Associate Dean of the Graduate School for Research at 
The University of Connecticut and Dr. Richard V. McCann, Director of 
Research for Region I of the United States Office of Education who 
offered us many valuable suggestions in the preparation of the pro- 
posal for this research. 
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of Connecticut Computer Center for their cooperation in processing 
our data and to the many students and their teachers who participated 
in the study. 



Dieter H. Paulus 
Joseph s* Renztilli 
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CHAPTER I 



STATEMENT OF THE PROBLEM 

Since the last decade when Guilford (1950) called attention to 
the virtual neglect of the concept of creativity by American re- 
searchers, there has been an enormous expansion of interest and re- 
search in the natiire of this hi^er mental process. A myriad of 
problems and controversies have surrounded work in the area of cre- 
ativity, but one of the most pressing issues continually has been 
the search for valid and reliable means of measuring creative per- 
formance . 

Workers in both the laboratory and the schools have recognized 
the need for valid and reliable devices to assess creative potential, 
but it is in the schools that the need is most acute. Curriculum 
specialists and classroom teachers alike are being asked to make 
provisions for highly creative youngsters, and provisions sire being 
made. But the students who benefit from these programs are being 
selected by a variety of subjective processes without the aid of a 
reliable identification instrument. Paulus and Renzulli (1969) have 
commented on this unfortunate state of affairs: "In other words, if 

we cannot accurately and economic ally determine who our most poten- 
tially creative youngsters are, then efforts to ’do something* for 
the highly creative are analogous to prescribing medicine before 
diagnosing the illness." 

The recent publication of the Torrance Tests of Creative 
Thinking (Torrance, 1966a) in many respects may be regarded as a 
breakthrough in the area of creativity measurement. Based on nearly 
nine years of research and development by Torrance and his colleagues , 
the tests represent a pioneering venture in that they provide the re- 
searcher and educational practitioner with a functional instrument for 
measuring creative potential in children, adolescents, and adults. In 
spirt of the relatively high level of development of the Torrance in- 
struments, certain technical problems related to levels of training 
on the part of scorers may act as deterrents to their widespread use. 
Torrance (1966c) reports correlation coefficients of interscorer 
reliability ranging frcan .76 to .98 for the different sub-tests; 
hovrever, he points out the possibility of errors in scoring that 
exist when scorers fail to read carefully the scoring guide or to 
scan adequately the weights assigned to certain dimensions of cre- 
ativity, At least one reviewer (Hoepfher, 1967), has called atten- 
tion to this problem and also has suggested that the time required 
to score the test battery may be a relatively long affair. 

Since the advent of the General Inquirer (Stone, et al. , 1966) 
and other strategies, computerized content analysis has been effec- 
tively used in the solution of many research problems. However, a 
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thorough review of the literature has revealed no previous research 
applying machine strategies to the scoring of creativity tests. That 
the approach is appropriate is attested to , nevertheless , by success- 
ful applications in related areas of natural language processing 
(Hiller, 1967; Page and Paulus, 1968; McManus, 1968; Marcotte, 1969). 

The purposes of the present research are: 1) to develop strate- 

gies appropriate to computer scoring the Torrance Tests of Creative 
Thinking (1966a); 2) to determine the effectiveness of actuarial 
measures, such as average sentence length, in the prediction of scores 
assigned to these tests; and 3) to make some initial explorations re- 
garding the appropriateness of the norms developed by Torrance (1966b) 
for scoring these tests. 

The remainder of this work is divided into five chapters. The 
most pertinent related literature for both content analysis and cre- 
ativity is presented in Chapter II. Chapter III discusses the pro- 
cedures followed in each of the various phases of the research. In 
Chapters IV and V the results of this research are presented in tabu- 
lar and narrative form. A discussion of these results and their im- 
plications as well as some theoretical considerations for future re- 
search are given in Chapter VI. 
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CHAPTER II 



RELATED LITERATURE 



Introduction 

Since the automated scoring of responses to the Torrance Tests 
of Creative Thinking (1966a) , is heavily dependent upon the strategies 
of content analysis, a review of the literature dealing with various 
aspects of analyzing the content of written material will be pre- 
sented here. A general review of relevant creativity research also 
will be given. 



Content Analysis 

The lable "content analysis” has been applied to many widely 
divergent research methods which, as Zieky (1968, p. 16) has observed, 
"have little in common other than the fundamental underlying assump- 
tion that there exist certain elements in any given communication 
which may be utilized as indicators from which inferences might be 
made, or by means of which the communication itself might be de- 
scribed.” Most recent formal definitions, however, have tended to 
restrict the applications of the term solely to those studies which 
atten^t the analysis of communication content in accordance with 
generally accepted criteria of sound scientific research (Berelson, 
1952; Cartwright, 1953; Stone, Dunphy, Smith and Ogilvie, 1966). A 
concise, yet representative definition is that proposed by Holstoi 
(1966, p. 10): "Content analysis is any technique for making infer- 

ences by systematically and objectively identifying specified charac- 
teristics of messages." 

Operationally , content analysis consists of the reduction of a 
text to those preselected characteristics which the researcher feels 
are relevant. It may be described as "making a particular many- to- few 
mapping of the the text" (Stone, et al., 1966, p. 7). The process has 
been described by Fisher (1968, p. 2): 

In particular, we attempt to reduce the complexity of 
a text to a limited set of potentially understandable 
events. To do this, we consider that a given meaning 
or theoretically defined quality (the numenon) may 
occur more than once, and that it may assume different 
forms of appearance (phenomena). 
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In the terminology of content analysis , a numenon is called a 
"tag,’' the grouping of related phenomena is called a "category,” and 
the grouping of categories relevant to a given study is called a 
"dictionary." The task of the content analyst is to code the mani- 
fold elements of a communication into instances of category memiber- 
ship. The range of possible categories is practically infinite, but 
the iiltimate criterion for selection is quite simple. Those cate- 
gories should be utilized which allow the investigator to answer 
the questions he is asking. 

As Zieky (1968, p. 18) noted: "The exact manner in which ele- 

ments of the text are to be reduced to category scores has become 
a major source of contention among researchers active in the field." 
Proponents of a qualitative coding procedure (George, 1959; Kracauer, 
1952) hold that an emphasis on numerical analysis restricts the range 
of the problems amenable to content analysis, and forces the re- 
searcher to ignore what might prove to be the most important variable 
in his source. In Kracauer 's opinion, ''many quantitative investiga- 
tions . . . mark the spot where a misplaced desire for objectivity 
has failed to reveal the inner dynamics of atomized content" (1952, 
p. 642). 

The advocates of quantitative research (Lasswell, et al. , 1952; 
Cartwright, 1953; Stone, ^^.9 1966), however, present a very 
strong case for their position. Their arguments stress the precision, 
replicability and generalizability of their results. The advantages 
of the quantitative procedures were promulgated by Budd (1963, p. 25). 

Quantification in content analysis, as in other research, 
leads eventually to summarizing procedures resulting in 
some sacrifice of detail.... What is gained, of course, 
is more valuable. For the analyst in reality has lost 
nothing..,. He has traded some unmanageable data for 
manageable information; he has exchanged his 'lost' data 
for efficiency and scientific rigor. 

Even thou^ the debate concerning the relative merits of quanti- 
tative and qualitative coding procedures has occupied a good deal of 
the time of those concerned with content analysis , the dichotomy is 
actually spurious (Holstoi, 1966). As Pool (1959, p. 192) remarked: 
"It should not be assumed that qualitative methods are insi^tful, and 
quantitative ones merely mechanical methods for checking hypotheses . 
The relationship is a circular one; each provides new insights on 
which the other can feed." 

The subject of all content analysis is some mode of communica- 
tion. The object of the research, however, may be information con- 
cerning either: ( 1 ) the communication itself;^ ( 2 ) the source of the 

communication; or (3) the effects of the communication. 

Studies of the first type have been used most frequently in the 
history of the research methodology. They require the least amount 
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of inference since the researcher is concerned mainly with the attri- 
butes of the message itself; but it should be eti 5 )hasized that such 
studies are not limited to mere descriptive statements (Zieky, 1968). 
Communication may be con 5 >ared to some relevant standard; communica- 
tions from the same source may be compared across time or across 
situations of prediction; or communications from two or more sources 
mi^t be investigated in parallel (Holstoi, 1966). 

Studies which attenpt to ascertain "Lawful relations between 
events or messages and processes transpiring in the individuals who 
produce . . . them” are clearly of the second type (Osgood, 1959, 
p. 36). Inferences are based on the assumption that certain attri- 
butes of the originator of a communication are reflected in variables 
existing within that communication. In effect, this also is the 
basis for the scoring of tests, for from the answers that are given 
to questions appearing on the test, inferences are made about the 
presence or absence of traits in the examinee. Moreover, increases 
in the amount of verbalization that constitute responses to the test 
are accompanied by increases in the number of inferences which must 
be drawn by the test scorers. It is the second type of content 
analysis, then, around which the present research centers. 

The third type of study, though theoretically as iii 5 >ortant as 
the first two, has been relatively rare. It seems probable that 
direct behavioral observation of the receiver of a message has been 
a more efficient way of discovering the effects of that message than 
content analysis has been (Zieky, 1968). 

No matter which of the above paradigms is chosen by the researcher, 
once category construction is complete he is faced with the task of 
transforming his data to instances of category membership. This pro- 
cess, known as coding, tends to be extremely slow, e:q>ensive and in- 
efficient. The difficulties involved in the coding process led 
Berelson (1952, p. 198) to warn that unless there were particularly 
good reasons for using content analysis, ”it is not worth going through 
the rigor of the procedure, especially when it is so arduous and so 
costly of effort.” 

The reduction of a communication for the purposes of content 
analysis, much like the scoring of many extant verbal -oriented tests, 
is tedious, repetitive, painstaking work which requires a large number 
of discreet con 5 >arisons , followed by decisions based upon those com- 
parisons. The very natiare of the task which makes it onerous for 
human judges makes it perfectly open to automation. Two sets of 
computer programs now exist which will perform the coding process 
quickly and accurately; The General Inquirer (Stone, et al. , 1966) 
and SCORTXT (Fisher, 1968). 

The advantages of the computer for content analysis are manifold. 
Once text is put into machine-readable form, any number of analyses 
might be attempted at a small increase in cost. The computer functions 
as a perfectly reliable judge which does not suffer from fatigue or 

5 

17 



lapses of attention. But perhaps an even more beneficial aspect of 
automation lies in the precise specification of both category members 
and coding procedures required of the researcher. In other words, 
the content analyst must know exactly what he is doing before he is 
able to instruct a computer to perform the same task. Savings in 
time are, of course, considerable; and the researcher is freed for 
attention to intellectual rather than mechanical problems. 

It must be stressed that the introduction of automated procedures 
to perform the coding in no way releases the content analyst from the 
responsibility of carefully performing the other aspects of the re- 
search, The computer will count whatever it has been told to count; 
and it is dangerous to assiame that lists of ntimbers are impressive, 
"scientific,” or co 3 ?rect, merely because they were generated by a 
con 5 >uter rather than a human judge. Both SCORTXT and General Inquirer 
replace one step of a content analytic study: that of reducing the 
communication to instances of category membership. If the data has 
been poorly san 5 >led, if the categories have not been well constructed, 
or if the statistical analysis is inadequate or erroneous, the use of 
a conputer to perform the coding will be a waste of time, money, and 
effort. The computer in content analysis is a tool used by the re- 
searcher, not a means of replacing the fallible human judgment upon 
which the quality of the study depends. 

An understanding of the wide scope of the applications of content 
analysis may best be gained through an examination of selected studies . 
During approximately the first third of the twentieth century, content 
analysis was centered on journalism. Using such variables as mmoter 
of column inches devoted to a topic, headline size and location. Speed 
(1893), Garth (1916), and Wi3JLey (1926), for example, were able to 
empirically substantiate their hypotheses concerning changes in focus, 
emphasis, and piiipose of the newspapers in which they were interested. 

The utility of content analysis outside of mass media research 
was demonstrated in the studies of enemy propaganda and political 
speeches which accompanied World War II. Under pressure to make 
vcilid predictions of enemy actions, researchers made advances in 
such methodological aspects as sampling theory, category, validity, 
coder reliability and syntactic analysis (Lasswell, et al. , 1952). 

The continuing interest in the problems noted above has been 
documented by Barcus (1959) in his survey of the literatin?e. He 
found that the niimber of studies done using content analysis has 
approximately doubled each decade since 1930. It seems probable 
that this rate will increase as the use of con 5 >uterized scoring 
procedures becomes more widely- accepted. One measure of the wide 
use which content analysis has achieved in recent years is the di- 
versity of the studies reported in The General Inquirer (Stone, et 
al. , 1966). The relation^ip between personality, ciiltin?e, and 
themes of folktales are analyzed, as are the distinguishing features 
of Presidenticil Nomination Acceptance Speeches, and the differences 
between the language of normal and psychotic subjects. There are 



also investigations of suicide notes, product images, therapy inter- 
views , and Huckleberry Finn . 

Although research has not been as expansive as one might e:q>ect , 
studies reporting the application of automated content analysis to 
the scoring of verbal responses to elected tests have also been 
found in the literature. Two studies utilizing the SCORTXT program 
for the scoring of teacher-constructed tests have been reported 
(McManus, 1968; Marcotte, 1969). An analysis intended to duplicate 
the hand-scoring system of content analysis for responses to the 
Thematic Apperception Test (McClelland, 1953) also has been atternpted, 
\ising The General Inquirer (Stone, et al. , 1966). The researchers, 
who in this study chose to score responses for the occurrence of one 
theme. Need Achievement, have commented (Stone, et al. , 1966, pp. 192- 
193): 



By so limiting otjt interest we can attemtp to construct 
a system that would answer the question , Does the sentence , 
paragraph, document, and so forth, contain X theme or does 
it not? This, of course, is not an artificial problem. 

In fact, a number of hand-scoring systems have been de- 
veloped to answer this t}^e of question. With this in 
mind, we decided to approach the task just set forth by 
attempting to duplicate a hand-scoring system on the com- 
puter. Therefore, our goal was to construct rules en- 
abling the computer to make decisions that are reliably 
similar to the decisions of a skilled judge. In making 
such an attempt, we have had in mind the following ques- 
tion: At our present level of technology, what are the 

possibilities of con^juters making interpretive judge- 
ments? As we come nearer to an answer to this question 
we also hope to illuminate more shaiply some of the dif- 
ficulties inherent in translating rules governing human 
strategies of decision into rules governing automatic 
strategies of decision. 

No attempt to apply similar strategies to the scoring of the 
Torrance Tests of Creative Thinking (1966a) has been found in the 
literature , 



Creativity 

Since the last decade when Guilford (1950) called attention to 
the virtual neglect of the concept of creativity by American re- 
searchers, there has been an enormous e:5q)ansion of interest and re- 
search in the nature of this higher mental process. Due to the 
plethora of relevant research which has followed Guilford ^s article, 
the review of the literatiare presented here makes no pretense ai being 
complete in its coverage. -However, a sampling of the research most 
pertinent to this study will be presented. 
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Many possible (and somewhat different) definitions for creativity 
have been offered. For example, Torrance (1966c) has defined cre- 
ativity as: 

A process of becoming sensitive to problems, defi- 
ciencies, gaps in knowledge, missing elements, dis- 
harmonies , and so on ; identifying the difficulty ; 
searching "for solutions, making guesses, or formu- 
lating hypotheses about the deficiencies; testing 
and retesting these hypotheses and possibly modify- 
ing and retesting them; and finally communicating 
the results. 

Although this definition allows the types of abilities , mental 
functions, and personality characteristics that are associated with 
creativity to be operationally defined, not all scholars agree with 
Torrance. Ausubel (1963), for example, objects on the grounds that 
creativity as a highly specialized and substantive capacity cannot 
be distinguished from general intellectual abilities, personality 
variables, and problem-solving traits. On the other hand Ghiselin 
(1955) proposes simply that the measure of a creative product be 
the extent to which it restructures our universe of understanding. 

The working definition used by Stein (1955) is that a process is 
creative when it results in a novel work that is accepted as tenab3.s 
or useful or satisfying by a group at some point in time. Others 
strongly feel that creativity measurement should only be employed 
when referring to such specialized fields as art, music, and writing 
(Kreuter and Kreuter, 1964; Mueller, 1964). It is clear that no 
single definition of creativity satisfying all the workers in the 
field has yet emerged. However, it is agreed that a psychological 
trait which generally can be referred to as creativity does exist, 
and that it exists in everyone to some extent (Lowenfeld, 1959; 
Hallman, 1963). 

Althou^ creativity does exist, the measurement of this higher 
mental process has not been an easy task, for a number of reasons. 
However, a variety of methods of assessing creativity has been 
attempted. Buel (1961), for exan 5 )le, used a number of personality 
instruments including the Kuder Preference Record (Kuder, 1953) to 
measure creativity. A study of the relationships between emotional 
stability, as measured by the Rorschach protocols, and creativity 
was made by Hammer (1961). An empirical study of the concept con- 
stant^ construct of Asher *s neo-field theory in which the Concept 
Constancy Test (1963) was \ised to assess creativity has also been 
reported (Jacobson and Asher, 1963). Studies in which the relatioi;- 
ship between creativity and a multiplicity of variables including 
movement responses (Griffin, 1958), novelty of stimuli (Houston 
and Mednick, 1963), and "incidental stimuli" (Mendelsohn and 
Griswold, 1964) have also been reported. Despite these many attempts, 
an acceptable instrument for the measurement of creativity has not 
been isolated. 
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The work of Guilford and his associates has contributed much to 
the measurement of creativity. Using his theoretical structiore of 
the human intellect as the research paradigm, Guilford (1967, p. 312) 
has found that problem solving and creative production are basically 
the same phenomena. He has chosen, therefore, to study creativity 
through problem solving, continually stressing the importance of 
divergent production tests, that is, tests that require examinees 
to produce their own answers rather than choose among alternatives. 

The Torrance Tests of Creative Thinking (1966a) which in many 
respects may be regarded as a breakthrough in the area of creativity 
measurement, represents a rather sharp depar*ture from the factor 
type tests developed by Guilford and his associates. Torrance and 
his colleagues have attempted to construct test activities that are 
"models of the creative process, each involving different kinds of 
thinking and each contributing something ^lnique to the battery iinder 
development" (1966c, p. 9). However, the products that result from 
the administration of these tests are assessed in terms of Guilford *s 
divergent thinking factors (fluency, flexibility, originality and 
elaboration). A further description of the Torrance Test of Creative 
Thinking will be presented in the following chapter, along with evi- 
dence for the reliability and validity of the test. 



Summary 

As previously mentioned, an exhaustive review of all of the 
literature dealing with both content analysis and creativity is 
clearly beyond the scope of this report. To provide some frame of 
reference, however, the most relevant research bearing on the present 
investigation was chosen for presentation. 

Although a considerable amount of 2 ?esearch in the possible uses 
of content analysis has been performed, the application of computer- 
ized content analytic techniques to the scoring of tests of creativity 
( in particxilar the Torrance Tests of Creative Thinking) has not been 
previously attecpted. Frcxn the literature reviewed here, however, the 
use of the technique does appear to be a justifiable approach to the 
problem of analyzing the content of verbal responses elicited by cre- 
ativity tests. 
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CHAPTER III 



PROCEDURES 



Introduction 

This chapter will discuss the procedures whicli have been 
employed in this research. Included vzill be a description of the 
sample of students whose responses were analyzed; a discussion of 
the reliability and validity of the TTCT , as well as a general 
description of the instrument itself; a description of the tech- 
niques used in coding the data so that they would be recognizable 
by the computer; and a detailed description of the training of the 
human judges who woiild provide the criteria against which the auto- 
mated scoring could be gauged, 5 s well as a presentation of the 
pooled reliabilities of the judges for each of the various criteria- 
The steps taken in the construction of content analysis dictionaries 
will also be discussed, and a description of the computer program 
used to perform the automated content analyses will be presented. 
Finally, the scoring strategy which utilized the SCORTXT program 
will be described. 



Description of the Sample 

The sample used in this study consisted of pupils from 16 
classes in grades four, five, six, and seven from six public school 
systems in central New York State. Classes were initially selected 
on the basis of similarity with respect to distribution of verbal 
intelligence scores, distribution of socio-economic class levels, 
reading levels, and sex (that is, similar proportions of boys and 
girls among classes at any grade level). On the basis of these 
criteria, four classes at each grade level were selected, yielding 
a sample of 375 students in 16 classes. From this large sample 153 
students were selected for participation in the present study. 

Table 1 summarizes by grade level the mean, standard deviation, and 
range of scores on the Lorge Thorndike Verbal Intelligence Tes t 
(Lorge and Thorndike, 1962). The data on the sex of the participat- 
ing pupils are summarized in Table 2. As can be seen in Table 2, 
the number of students participating differed from one grade leveJ. 
to the next. This was anticipated, however, since pupils were 
randomly selected from the total pool of data rather than for eacxi 
grade level as such. 

To allow for cross-va2d.dation processes the total sample of 
153 subjects was randomly divided into two sanqjles of 100 subjects 
and 53 subjects respectively (Hosier, 1951). The sample of size 
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TABLE 1 



LORGE THORNDIKE VERBAL INTELLIGENCE TEST 
SCORES BY GRADE LEVEL 
FOR TOTAL SAMPLE^ 



Grade 


N 


Mean 


SD 


Range 


4 


19 


110.06 


18.97 


76-137 


5 


33 


110.50 


16.82 


95-143 


6 


50 


106.98 


12.43 


73-134 


7 


51 


112.19 


13.64 


88-138 



TABLE 2 

SUMl-IARY OF SEX DISTRIBUTION 
BY GRADE LEVEL FOR TOTAL SAMPLE 


Grade 


Boys 


Girls 


Total 


4 


7 


12 


19 


5 


16 


17 


33 


6 


23 


27 


50 


7 


20 


31 


51 



In grades foiir through six, level three, form A, verbal; in grade 
seven, level four, form A, verbal. 
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100 served as the deve lopment al sample for the computerized scoring 
procedure, while the cross -valid at ion of these procedures was per- 
formed on the sample whose size was 53. 



s 
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Instrument at ion 



Description of the Torrance Test 

The measure of verbal creativity employed in this research was 
the Torrance Tests of Creative Thinking , Verbal Form ^ (Torrance , 
1966a) . This battery consists of seven parallel tasks of activities 
believed to bring into play somewhat different mental processes 
requiring the subject to think in divergent directions in terms of 
possibilities. The tasks or activities include: asking questions 

about a drawing; making guesses about the causes of a pictured event 
producing ideas of improving a toy; thinking of imusual uses of a 
cardboard box; and thinking of the varied possible ramifications of 
an improbable event. The subtests are administered in a paper and 
pencil format in one setting requiring approximately 45 minutes of 
actual, test-taking time. 

With the exception of Activity 6 , all seven Activities yield 
three scores for each response made by the subject. Fluency, the 
first score given, ”is defined as the total number of relevant re- 
sponses, relevancy being defined in terms of the requirements of 
the tasks as set forth in the Instructions” (Torrance, 1966b, p. 15). 
For exan5>le. Activity 1, the Ask Activity, the Fluency score is the 
mamber of relevant questions asked. Questions that can be answered 
merely by looking at the picture, however, are not considered rele- 
vant, therefore j are not counted. The fluency scores for the re- 
maining activities are determined by counting the number of rele- 
vant responses. To determine the Flexibility score, the second 
scor*e given each response , a number of categories have been con- 
structed by Torrance and his associates. A category here simply 
means a classification or grouping of like responses ; that is , 
responses dealing with the same subject matter. For exan^jle. 

Activity 1 in which questions are asked about a drawing, 22 
characteristics of the drawing have been isolated. Examples of 
these categories are: a) the description of the figure; b) the 

physical action of the figure; c) the general costtime worn by the 
figure; and d) the emotions of the figure. Each response or ques- 
tion that is classifiable within one of the 22 categories is awarded 
one point for Flexibility. Ml replications of the usage of any 
category are deleted, and the total Flexibility score for any activ- 
ity is the sum of the points awarded to the individual responses , 
that is, the number of different categories used. 



For the first five activities, for which the number of cate- 
gories for each activity ranges from 21 to 24, the flexibility score 
is obtained by employing the aforementioned procedi^res. The flexi- 
bility score for Activity 7, ”Just Suppose", is defined as a change 
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or shift in attitude or focus. The first response is not scored. 
Each shift or change in attitude or focus receives one point . Once 
a shift is credited, duplications do not receive additional credit. 

The third score given. Originality , is based on the frequenc/ 
of the response in the population. The most frequent responses are 
assigned wei^ts of zero and one, and a listing of these responses 
can be found in the scoring manual. If "infrequent responses show 
any creative strength and get away from the obvious, an Originality 
score of two should be assigned" (Torrance, 1966b, p. 20). The 
Originality score for each activity is the s\im of the Originality 
scores for each of the individual responses. In Activity 6, Unusual 
Questions, originality weights for frequent responses are not pro- 
vided. Responses are classified as either questions which require 
simple answers, questions which require complex answers, or diver- 
genr questions. Each question is then deemed either personal or 
factual, and corresponding weight are then assigned. As Torrance 
notes (1966b, p, 42); 

The kind of originality involved here may be somewhat 
different from the kind of originality involved in the 
other activities, since it is not based on statistical 
infrequency of response. Experience has shown, however, 
that there is a high po'itive correlation between sta- 
tistical infrequency and the scores assigned using the 
criteria listed above, adopted from Burkhart’s work 
(Burkhart and Bemheim, 1963). 

To summarize the scoring procedures three scores are reported 
for each siabject . The Fluency and Originality scores represent the 
sum of the corresponding scores attained in each of the seven activi- 
ties. The Flexibility score is computed similarly, but it is based 
on six acti^-vities. Torrance cautions against combining the three 
dimensions of creativity scores to obtain a grand total, although 
several studies have based a criterion for creativity on such a 
total. 



The Reliability of the Torrance Test 

The test -retest reliability of the Torrance Tests of Creative 
Thinking (TTCT) has been shown to be quite high. In a test -retest 
situation \ising alternate forms of the TTCT , Torrance has reported 
reliability coefficients of .93 for Fluency, .84 for Flexibility, 
and .88 for Originality when the interval between testings was 
from one to two weeks. When the test interval was eight months, 
reliabilities of .79 for Fluency, .61 for Flexibility and .73 for 
Originality were found (Torrance, 1966c). It appears, then, that 
the size of the reliability coefficient is inversely related to .le 
time between testing in the test -retest situation. It is imponuant 
to note, however, that the creative abilities measured by the TTCT 
are susceptible to development through educational experiences. This 
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would indeed influence the reliabilities in the manner previously 
stated, Goralski (1964) has reported reliability coefficients of 
.82, .78, and .59 for Fluency, Flexibility, and Originality when 
the interval was ten weeks. Mackler (1962) tested the same subjects 
three times with three different forms of the Ask-and-Guess test, 
each testing separated by a two-week interval. He obtained reli- 
abilities of ,82 (first and second testing), .89 (second and third 
testings), and .84 (first and third testings). 

Both the inter- and intra-scorer reliabilities of the TTCT have 
also been shown to be very good. Torrance (1966c, p. 18) reports 
inter- and intra-scorer reliabilities to be "consistently above .90 
and there have been only very small differences in means.” In spite 
of these high reli^ility coefficients, Torrance points out the 
possibility of errors in scoring that exist when scorers fail to 
read carefully the scoring guide or to scan adequately the wei^ts 
assigned to the Flexibility and Originality dimensions of Creativity 
(Torrance, 1966b). At least one reviewer (Hoepfner, 1967) has 
called attention to this problem. This shortcoming may be overcome, 
however, by the use of a computer programmed to score the TTCT, for, 
unlike humans, the computer can be counted on to be perfectly reli- 
able. 

The Validity of the Torrance Test 

Evidence for the validity of the TTCT has t>een presented and 
discxissed by Torrance (1966c). The evidence for the four principal 
aspects of validity-content validity, construct validity, concurrent 
validity, and predictive validity will be presented. 

Making a case for the content validity of the TTCT Torrance 
states : 

A consistent and deliberate effort has been made to 
base the test stimuli , the test tasks , instructions , 
and scoring procedures on the best theory and research 
available. AnaLlyses of the lives of indisputably 
eminent, creative people, research concerning the 
personalities of eminent creative people, the nature 
of performances of the human mind, and the like have 
been considered in making decisions regarding the 
selection of test tasks. A deliberate and consistent 
effort has also been made to keep the test tcisks free 
of technical or subject matter content (1966c, p. 24). 

Due to the complexity of creativity, Torrance does not believe that 
anyone can now specify the total range of test tasks necessary -to 
completely assess or specify creative behavior. He does believe, 
however, that the test of tasks assembled in the battery do sample 
a wide range of the abilities of such a universe. 
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Evidence for the construct validity of the TTCT is presented 
by Torrance for children, high school youth, adults, and for studies 
of grovrth and learning. Since the present research utilized a 
sample composed of elementary school children, only those studies 
involving the use of the tests with elementary school children will 
be reviewed, 

Weisberg and Springer (1961) compared the personality character- 
istics of highly creative and less creative fourth grade children, 
using the TTCT as the creativity criterion. The highly creative child- 
ren were rated significantly higher than the less creative children 
on strength of self-image, ease of recall, humor, availability of 
Oedipal anxiety, and uneven ego development. These resiilts, accord- 
ing to Torrance, reflect what might be called a creative acceptance 
of oneself and a greater self-awareness (1966c, p. 25). Rorschach 
protocols from the same research show that children of high creativity 
showed a tendency toward unconventional responses , unreal percepts , 
and fanciful and imaginative treatment of the blots. In addition, 
they also gave more human movement and color responses that the low 
group, signs regarded as indications of imaginativeness and creative- 
ness among projective examiners. 

In a study conducted by Torrance (1962) with highly creative 
children and less creative controls, three personality character- 
istics were found to differentiate the two groups. First, the highly 
creative yoimgsters produced significantly more "wild" or silly ideas. 
Second, their productions showed a hi^ degree of Originality. Third, 
their productions were chairacterized by hinnor, playfulness, and 
relative relaxation, Fleming and Weintraub (1962) studied the rela- 
tionship between rigidity and creativity as measured by the TTCT. 
Significant negative correlations were found between rigidity and 
Fluency, Flexibility, and Originality respectively. Yamamoto (1963) 
likewise reports significant correlations between creativity as 
measured by the Torrance tests and measures of Originality derived 
from evaluations of the imaginative stories of 20 fifth and 20 sixth 
graders . 

In a study involving children in grades two through seven. 

Long and Henderson (1964) found that children high on creativity 
were able to withhold opinions under conditions of information in- 
adequacy, withstand the uncertainty of an undecided state, and 
resist premature closure. Studies by Lieberman (1965), Long, 

Henderson, Ziller (1965), and Torrance (1963) present further evi- 
dence for the construct validity of the Torrance Tests of Creative 
Thinking. 

Due to the lack of generally acceptable criteria, the concurrent 
validity of the TTCT has been difficult to assess. A number of cri- 
teria have been employed, however, and the results of these investi- 
gations do offer evidence worth noting. 
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Yamamoto (1964) anailyzed the relationship between sociometric 
ratings and selected sxib-tests of the TTCT. The sociometric ques- 
tions used in this study were designed to measure dimensions of 
creative thinking ability. Statistically significant positive cor- 
relations were found, but this was due largely to the size of the 
sample (N for this study was 459). 

Torrance (1962, 1963) and Yamamoto (1962) report studies with 
elementary school children in which the criterion for concurrent 
validation of the TTCT was teacher nominations. The results of 
these studies indicate that pupils nominated by their teachers as 
most Fluent, Flexible, and Original in their thinking can be dif- 
ferentiated by their scores on the TTCT from pupils nominated as 
lowest on these dimensions of creativity. The results reported by 
Nelson (1963) , who had teachers use the Q-sort technique to rank 
students on creativity, agree with those reported by Torrance. 

Another criterion measxire employed in concurrent validity 
studies is educational achievement. Bish (1964) used the Cailifomia 
Achievement Test scores as criteria in a study with fourth , fifth , 
and sixth grade youngsters. Correlations ranging from .36 to .42 
were found between measures of Fluency, Flexibility, and Originality 
and measures of achievement. When IQ was partialled out these cor- 
relations increased. Cicirelli (1965) reports results similar to 
those found by Bish. Perry (1966) found statistically significant 
rank order correlations between the TTCT and subtests of the Stanford 
Achievement Test. The relationship between creativity and grades 
given in school was also investigated by Perry, and the correlation 
foiand (.10) was not statistically significant. 

Since the Torrance Test of Creative Thinking is a relatively 
new instrument, only a limited amount of evidence for the predictive 
validity is available. Torrance reports, however, that a variety 
of long-range predictive validity studies are imderway and that 
others have been planned for the near future (1966c, p. 54). In a 
study reported by Erickson (1966) creativity scores from the TTCT 
administered to high school seniors in 1959 were correlated with the 
scores from a checklist of creative activities administered in 1966. 
Forty- four of the sixty-six high school seniors who were tested in 
1959 returned the checklist. Fluency and Flexibility scores cor- 
related positively with the checklist criterion ( .27 and .24 respec- 
tively) but Originality did not. The 44 subjects supplying checklist 
data were then divided into two equal groups on the basis of cre- 
ativity test scores and tetrachoric correlations were computed for 
each item. Originality, Elaboration, and Total Creativity scores 
successfully predicted participation in a number of activities 
(writing a poem, story, song, or play; receiving a research grant; 
learning a new language, etc.) at acceptable statistical levels. 

Despite the lack of rigerous evidence for predictive validity, 
the use of the TTCT in various forms of research does seem warranted 
as Torrance has claimed (1966c). Further, it does appear that the 



cautious use of the TTCT in a school setting is appropriate , since 
no better instrument for the assessment of creative potential is now 
available • 



Data Collection and Coding Procedures 

The data analyzed in this research consist of the responses of 
153 elementary school children to the To^ance Test of Creative 
Thinking , Verbal Form A (Torrance, 1966a) . The tests were adminis- 
tered in a group setting in accordance with the guidelines set forth 
in the test manual (Torrance, 1966c). 

To perform the computerized content analyses of the data it 
was necessary to transcribe the responses into machine readable form. 
This was accomplished by keypunching the responses on standard IBM 
cards, one response to a card. Since no corrections in spelling, 
punctuation, grammar, etc., were made on the original copy, the 
keypunched data were an exact duplicate of the responses given in 
the test booklets. A sample of the kej^unched data is shown in 
Figure 1. The first card listed in the Figure is a title or header 
card indicating that the responses which follow were given by the 
subject whose ID number was 0397. Following the title card are the 
responses given by the subject. For each response the ID number is 
punched in columns 1-4, the Activity number, which is ”1” for the 
illustrated listing, is punched in column 5, and the response is 
given in columns 8-80. This siabject made five responses to Activity 
1, hence, the numbers in columns 6-7 range frean 01 to 05. The card 
with the asterisks in colimins 1-2 indicates the end of the responses 
for this subject. Note that for response five, the word "moment" is 
keypunched as "momet" since this was the spelling given in the test 
booklet. Also, although this response is stated in question form, 
no question mark was supplied by the subject. This, too, is reflected 
in the keypunching. 



Human Test Rating Procedures 



The Judges 

To provide criterion meas\ires against v/hich the performance of 
the computer scoring of the tests could be gauged, it was necessary 
to obtain hiaman ratings of the data. Four educational psychology 
students originally were selected and employed for this purpose. Due 
to changes in the availability of personnel two of the four scorers 
were replaced after the completion of the first three activities. 

Procedures for Training Judges 

To provide xiniformity or orientation and to improve inter-scorer 
reliability, a number of procedures were utilized in the training of 
the judges. To give a greater appreciation for the concept of 
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0397 



0397101Where Did this Boy Come from? 

0397102What is his name? 

0397103How did he get there? 

0397104Why is he there? 

0397105What is his means of transportation at the momet. 



FIGURE 1 

A SAMPLE OF MACHINE -READABLE RESPONSES TO THE 
TORRANCE TESTS OF CREATIVE TRUCKING 






creativity by becoming actively involved in the creative process , 
each judge was administered the Torrence Tests of Creative Thinking » 
Verbal Form A , Next , a series of seminars were conducted for the 
scorers during which the process of creativity and possible problems 
relating to the scoring procedures were discussed. The scorers were 
then provided with copies of Torrance *s Guiding Creative Talent 
(1962) and were asked to read selected chapters. Copies of the 
Torrance Tests of Creative Thinking : Norms-Technical Manual (Torrance , 

1966c) and the Torrance Tests of Creative Thinking : Directions 

Manual and Scoring Guide (Torrance, 1966b) were also provided. After 
the literature and manuals had been read, the judges were asked to 
score a sample set of responses listed in the Scoring Guide. The 
scorers then met as a group and discussed their rationale for assign- 
ing scores to each of the individual responses. Where differences 
of opinion existed between the judges and the Scoring Guide, the 
possible reasons for such differences were analyzed. As a final 
activity in the training process, a meeting was arranged between the 
scorers and Dr. E. Paul Torrance. During this meeting the scorers 
had the opportunity to raise any unresolved questions emanating from 
the practice scoring which they had performed. 

Additional steps taken to inprove reliability included: a) a 

discussion of the optimal amount of time for scoring in any one 
sitting; b) the provision of a "paste-up” of the scoring manual 
that enabled the scorers to view one activity or sub-test at a 
glance; and c) the scoring of the responses of all subjects to one 
activity before proceeding to the next acticity. 

The actual scoring of the data was accomplished by the judges 
both individually and in group sessions in 1969. 



Reliability of the Judges 

y 

i The estimate of the reliability of the four judges taken as a 

i group was determined for each of the activities for the total sample, 

> the developmental, and the cross-validation sample respectively. For 

each activity the reliabilities for Fluency, Flexibility, and Origi- 
j nality were computed separately (as previously noted, there is no 

; Flexibility score for activity six). The analysis of variance tech- 

i nique with judges acting as treatments (Winer, 1962, pp. 124-132) was 

used to obtain these estimates. The pooled reliabilities for each 
activity-dimension combination are reported in Tables 3, 4, and 5. 

With the possible exception of the pooled reliability for the 
Originality dimension for Activities 3, 6, and 7, the reliabilities 
found for each set of four judges were higher than expected. 

; In an effort to determine the response tendencies of each of the 

I scorers the means and standard deviations of the judges scores for 

• the respective activity-dimension combinations were obtained. These 

f may be found in Tables 6, 7, and 8. As can be seen, these means and 
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RELIABILITY ESTIMATES FOR JUDGES USING 
ANALYSIS OF VARIANCE CROSS-VALIDATION SAMPLE 
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MEANS AND STANDARD DEVIATIONS FOR JUDGES 
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standard deviations are highly similar for the Fluency and Flexihilit 3 ’’ 
dimensions. Some siibstantial differences may be noted for the Origi- 
nality scores, which is not surprising since the assessment of Origi- 
naJ.ity is the most difficult task a judge must perform in scoring the 
TTCT. This suggests that a weighted composite of the judges^ scores 
might j^ield a more reliable estimate of Originality. Results reported 
in a derivative study by Greene (1970), where factor anal3Ttic tech- 
niques were employed to derive differential weightings for judges, 
do not indicate great increases in the reliability of those scores - 
It appears, then, that the use of a simple composite of judges scores 
as the criteria for this research is warranted. 



The Tags 

The isolation of a set of concepts to be used as tags or cate- 
gory headings is a process that is of major importance to the outcome 
of content analysis research. Zieky (1968, p. 38) observed: 

If the most rigorous of validating procedures is used 
to confirm category membership, if the coding procediire 
is perfectly reliable, and if good statistical tech- 
niques are used to analyze the data, the research will 
nevertheless be of little use unless those concepts 
represented by the categories are relevant and theo- 
retically meaningful. 

Usually, the researcher at this point is faced with two choices: he 

may either isolate his own set of concepts , or utilize those repre- 
sented by a pre-existing content analysis dictionary. In the present 
research, however, tags had been isolated, but the dictionaries 
associated with these concepts had not been constructed (Archambault , 
1969). 

As previously mentioned, Torrance and his associates have iso- 
lated sets of concepts to be used in the scoring of each activity of 
the TTCT. That the concepts are relevant and theoretically meaningful 
has been argued repeatedly by Torrance in the discussions of the 
various validity studies of the tests. This ideal situation indi- 
cated not only that these concepts would be used in the present re- 
search, but also that the techniques to be employed were both appro- 
priate and justified. 

For Activity 1, 22 concepts have been used in the assignment of 
Flexibility scores to the subjects* responses. A listing of these 
concepts is given in Figure 2. Since the meaning attached to each 
concept is e3q>lained in detail in the Scoring Guide, only the cate- 
gory headings have been given here . The 21 tags of Activity 2 and 
Activity 3 are the same as those listed for Activity 1 with the fol- 
lowing exceptions: Tag 13 of Activity 1, Personal Possession or Past 

History of Figiore, is deleted; Tag 22 of Activity 1, the Whole Picture, 
is deleted; Tag 17 of Activity 1, Reflective Surface and Reflection 
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1 . Characters 

2 . Costume 

3. Description of the Figure 

4. Emotions 

5. Ethnic Factors 

6 . Family 

7. Grotind Surface and Characteristics of Objects on it 

8. Hat 

9 . Location 

10 . Magic 

11 . Occupation 

12 . Pants 

13. Personal Possessions or Past History of Figure 

14. Physical Action Related to Reflective Surface 

15. Physical Action Unrelated to Water 

16. Physical Characteristics of Objects or Situation 

17. Reflective Surface and Reflection Itself 

18 . Shirt 

19 . Shoes 

20 . Time 

21. Underwater 

22 . VJhole Picture 

FIGURE 2 

LISTING OF THE CATEGORY HEADINGS USED IN 

SCORING ACTIVITIES 1, 2, and 3, OF THE TTCT 

NOTE: For Activities 2 and 3 tags 13 and 22 were deleted. Tag 17 

was subdivided into two tags for Activities 2 and 3. 
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Itself, becomes two tags in Activities 2 and 3, one tag for the Re- 
flective Surface and the other tag for the Reflection Itself. 

The specific categories for Activities 4 and 5 are listed in 
Figure 3 and Figure 4. No flexibility category exists for Activity/ 

6 . 



For Activity 7 isolated concepts were not provided and a shift 
in attitude was used to denote flexibility. Rather than approacl^ 
this shift parameter directly, 39 categories were isolated (Greene, 
1970) to function as an estimator of the shift parameter. These 
categories were derived by considering the originality weight list 
provided by Torrance for this activity and by examining the actual 
subject responses in the developmental sample; however, no specific 
name was assigned to each category, and they were mexHDly numbered. 

The construction of the categories for each of these tags will be 
discussed in the next section. 

In addition to the tags for Flexibility, it was necessary to 
designate concepts for the scoring of Origina^.ity . Since responses 
to the first three Activities of the TTCT are assigned Originality 
weights of either 0, 1, or 2, a tag for each weight was isolated. 
Actually, since the categories constructed for the Originality tags 
are combinations of entries from the Flexibility dictionaries, it 
would have been possible to weight the entries in the Flexibility 
categories with the appropriate Originality loadings. This procsdui*?^ 
however, would have necessitated complex modifications in the SCORTXT 
program. For this reason the isolation of tags for Originality seeu-'j 
more appropriate. 

The categories for Originality for Activities, 4,5, and 7 were 
implicitly defined by Torrance. As will be explained in the next 
section, the Originality 0-weight has no value in the computerized 
scoring process, but it is autonatically generated from the Flexi- 
bility categories. The process of assessing Originality in Activity 
6, **Unusual Questions”, is xmique. The technique for categorizing 
the questions is presented again in Figure 5 with the corresponding 
point values. Rather than considering six categories, one for each 
factor or 1^e of question, only three categories were generated. 
These categories were based on the point values 1, 2 and 4. Even 
though the Original six categories have been collapsed to three, no 
useable information has been sacrificed. 



Category Contruction 

When Idle tags to be used in the content analysis had been de- 
cided upon, the next step in the construction of the dictionaries 
was the assignment of words and phrase to the particular categories v 
The major theoretical basis for the word assignment was the simple 
and widely accepted principle of synonymity . The operational defini- 
tion upon vrhich the category construction was based has been given 



1. Adaptation 

2 . Addition 

3 . Change Color 

4. Change Shape 

5. Combination 

6 . Division 

7 . Magnification 

8 . Minif ication 

9 . Motion 

10. Multiplication 

11. Position 

12. Quality of Material 

13 . Rearrangement 

14. Reversal 

15 . Ear Appeal 

16 . Touch Appeal 

17 . Eye Appeal 

18 . Smell Appeal 

19 . Substitution 

20. Subtraction 

21. Humanization 



FIGURE 3 



FLEXIBILITY CATEGORY HEADINGS FOR ACTIVITY 4 









1. Animal Shelter 

2. Animal Uses Other than Shelter 

3 . Art Uses 



4. 


Buildings 


5. 


Construction Uses 


6. 


Carrier 


7. 


Container 


8. 


Costume 


9. 


Cover 


10. 


Destruction 


11. 


Education 


12. 


Fumitxzre 


13. 


Games 


14. 


Growing 


15. 


Household Appliances and Other Items 


16. 


Protection 


17. 


Scientific Uses and Equipment 


18. 


Storage 


19. 


Tools 


20. 


Toy Fxamiture or Household Appliances 


21. 


Toys 


22. 


Transportation (Air) 


23. 


Transportation (Surface) 


24. 


Weapons 




FIGURE 4 



FLEXIBILITY CATEGORY HEADINGS FOR ACTIVITY 5 



Type of Question 



Personal 



Factual 



Simple Answer 


1 point 


0 points 


Con 5 >lex Answer 


2 points 


0 points 


Divergent 


4 points 


4 points 



FIGURE 5 

ORIGINALITY CATEGORY HEADINGS AND WEIGHTS 
FOR ACTIVITY 6 



by Zieky; ’’Those words which are synonyms of a given word are listed 
iinder the main entry for that given word in a standard dictionary of 
English synonyms or a standard thesaurus” (1968, p. 42). 

Sedelow, Sedelow, and Ruggles (1964, p. 220) established a 
precedent for the use of such a procedure; 

Because conventional thej^auri are organized in terms of 
putative semantic relationships, we have chosen to use 
the thesaurus form as the basis for the VIA program. We 
take it, further, that semantic similarity is perceived 
in part in terms of word roots, i.e., words with the 
same root are likely to have meanings which have some 
connection with each other. Our VIA thesaurus, therefore, 
is constructed on the basis of (a) identical root, (b) 
s 3 rnonymity and anton 3 miity. . . 

The procedure followed in the construction of categories, bases 
on the procedinres listed by Zieky (1968, pp. 43-46), will be discussed 
in two sections. This is necessary since different operations were 
followed in the construction of the Flexibility and Originality 
categories. The following are the operations for the construction 
of the Flexibility categories: 

(1) Each conceptual heading or tag was foxmd in Roget * s Inter- 
national Thesaurus (1962). For example, a search of the tag ”magic” 
produced words such as "witchcraft,” "spell,” "sorcery,” "charm,” etc. 

(2) The word list was copied on IBM cards, one word to a card, 
with the exclusion of those words marked as dialectical, vulgar, 
colloquial, jocular, or slang. 

(3) The same procedure was followed with word lists that appeared 
under headings cross-referenced by the original main entry. 

(4) The synon 3 nns of each word were then found in Soule *s 
Dictionary of English Synonyms (1966), and entered on IBM cards. 

(5) The synonyms of the first-order synonyms were similarly 
retrieved to form a list of second -order synonyms. It was determined 
that retrieval of third-order synon 3 mis was not profitable in terms 

of the ratio of additions to the category to the niomber of words 
retrieved. 

(6) Since it is desirable for the categories to be mutually 
exclusive as has been stressed by Budd (1966) and Pools (1959), 
phrases were Tised to disambiguate the meaning of words which might 
appear in more than one category. For example, the word "play” is 
appropriate for both category 11 and category 13. For category 11, 
the phrase "play a part" was the meaning desired while "play in 
water” was the meaning intended for category 13. When the phrases 
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are employed instead of the single words the difference in meaning 
is obvious to both the reader and the ccxnputer. 

(7) After the list derived ft*om the thesaurus and synonym 
dictionary was completed and a check for errors was made, morpho- 
phonemic variants of root words were added to the synonym list . For 
example, the words "jiomped" and "jumping” were added to the list 
which had formerly contained the word "jump." 

(8) Throughout the category construction process listings of 
the responses of the developmental sample were consulted to deter- 
mine if pertinent entries had been omitted. The precedent for this 
approach was established by Stone, e^ al. (1966, pp. 147-148): 

Thus the language characteristics of the target 
population of the study affect the dictionary by 
establishing which theoretical distinctions will 
be meaningful in practice and by affecting the 
assignment of entiy words to particular tags on 
the basis of their usage. We increasingly use 
samples of the text to be analyzed in creating 
lists of entry words, determining the most appro- 
priate definitions (on the basis of the usage 
exhibited in these samples from documents of this 
particular language community) and checking whether 
the levels of abstration of our tag categories are 
feasible. A dictionary is the product that emerges 
from the dual demands of theory on the one hand and 
concrete data on the other. 

(9) The resulting list of words and phrases formed the category 
which was given another check for errors. These procedinres were 
generalized to the remaining activities of the TTCT (with the ex- 
ception of Activity 6). The categories were then ke 3 ^unched on IBM 
cards in the format specified for the operation of the SCORTXT pro- 
gram. A computer listing of the dictionary was then attained and 
the keypunching verified. 

The Originality categories (with the exception of Activity 6) 
were constructed from the Flexibility categories that had already 
been generated. Again, the criterion for inclusion in each category 
Wcis synonymity. The steps included in the construction of these 
categories were: 

(1) For each Flexibility category the weights given to the 
frequent responses listed in the Scoring Guide were analyzed. It 
will be recalled that frequent responses are given Originality 
weights of zero and one. 

(2) For each response given an Originality weight of zero in 
the manual, the key word or phrase that would enable the scorer to 
classify this response in the particular Flexibility category was 
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extracted. 'This word or phrase would then be entered in the Origi- 
nality category of zero weights. For example, for Activity 1 in 
Flexibility category 20 the response, ”How old is he?", is given a 
zero Originality weight. Since Flexibility category 20 deals with 
time, the key word in this response is "old." The word "old," which 
is in the Flexibility dictionary, would, therefore, be entered in 
the zero Originality category, 

(3) For each word or phrase entered in the zero Originality 
dictionary by "2*' above , all synonyms of the woz?d or phrase occurring 
in the Flexibility dictionary were extracted and included in the zero 
Originality category. As an example, the word "age" is a synon}^ of 
"old" and is included in the Flexibility dictionary. The word "age" 
should, therefore, be assigned an Originality weight of zero. 

(4) The procedures of steps 2 and 3 above for responses given 
an Originality wei^t of one in the Scoring Manual were followed. 
These entries will constitute the Originality 1 category. 

(5) Any entries of the Flexibility dictionary which have not 
been entered in either the zero or one Originality categories were 
entered in the Originality 2 category. This was justified under 
the guidelines set forth by Torrance who stated" "A judgment has 
to be made concerning the obviousness of a response when it is not 
included in the lists accompanied by Originality weights and Flexi- 
bility categories. Most responses not included in these lists 
should be given maximum Originality weights," (1966c, p, 20) 

The basic structxire for the three categories of the Originality 
dictionary of Activity 6 was generated by identifying the key phrases 
in the examples provided by Torrance. His definition of each type 
of question was also considered. Although the scoring procedure 
involved here is complex, a relatively simple device for the genera- 
tion of Originality categories emerged. The appropriateness of this 
scheme was justified when the dictionary generated by it proved to be 
a most useful predictor. 




Computerized Scoring Procedures Utilizing the SCORTXT Program 

When all of the steps in the preparation of the data and the 
biiilding of dictionaries had been taken, the next procediire to be 
completed was the automated reduction of siobjects' responses to 
instances of category membership. That is, through the use of a 
computer programmed to perform the task, the verbal input was re- 
duced to a series of numbers indicating scores for the various cate- 
gories. Concxirrent with this reductive process, average word length, 
=iverage sentence length, nvraiber of periods, nimiber of question marks 
and other acturaial measinres were calculated. This entire procedure 
was performed by Fisher's SCORTXT program (1968), 

The SCORTXT system consists of a main program and nine sub- 
routines, all written PL/1. Although the program currently runs 
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lander the IBM 360 OS system, there is no machine dependence built 
into the program. The program itself has four sections. The first 
deterroines the run options for any particular analysis. These in- 
clude the printing of text in string or array form, the removal or 
retention of punctuation, the choice of card margins to be scanned, 
the maximum storage size for the input text and the dictionary, 
and the printing of the item analyses for each category in the 
dictionary. In addition, punctuation counts and word length sta- 
tistics can also be calculated by the program 

The second section is devoted to the construction of a dictionary 
file. As mentioned, both words and phrases may be included in the 
dictionary. Dictionary entries are sorted internally into alphabeti- 
cal sequence, so the categories may be entered in any order. The 
third, section creates a text file of the data to be processed, which 
in this case are the students* responses to the TTCT. The actual 
scoring is done in the fourth section of the pmgram by the use of 
a binary search ailgorithm. To the writers’ knowledge, this is the 
only phrase lookup algorithm employing the binary search technique 
throughout. After the text has been processed, the category counts 
and the counts for the various word length statistics ar*e pionched 
out on IBM cards. Printed output is determined by the options that 
have been selected for any particular run. It should be noted that 
SCORTXT was devised to be generally applicable to a wide range of 
natural language analysis problems. Any of the subroutines may be 
used independently; any nu^er of texts may be pi?ocessed with any 
number of dictionaries; phrases as we3J. as single words may be 
included in the dictionary, and those phrases need not be fully de- 
fined. For example, if phrases such as "playing near the water," 
"playing in the water," "playing with the water are membears of a 
dictionary; they could all be coded by the single entry "playing x 
the water"; where x is defined as the 0-8-2 punch. 

To maximize the prediction of each subjects’ scores for each 
activity of the TTCT the step-wise multiple regression technique 
was employed. Since scores for Fluency, Flexibility, and 
Originality are predicted differently, they will be discussed indi- 
viducilly in this section. However, each of the approaches to be 
described is based on the rationale given by Toa?rance in the Direc- 
tions Manual and Scoring Guide (1966b). 

As mentioned previously, the Fluency score for each Activity is 
defined as the sum of the Fluency scores for the individual responses. 
Since no consideration is given at this point to the category in which 
the response falls , the punctuation and vovd. length statistics calcu- 
lated by SCORTXT were the only variables used in these predictions. 
Those SCORTXT variables considered were chosen by the following rnile: 
each variable with a mean and standard deviation greater than one, 
i.e., those variables for which instances were actually foiond in the 
responses, were included in the prediction equations. Twenty-four of 
the 59 statistics calculated by SCORTXT were isolated by this rule. 
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Figure 6 gives a listing of these variables. The results and predic- 
tion equations will be presented and discussed in the following 
chapters. 

The Flexibility score for each activity is defined as the sum 
of the Flexibility scores for each response, when replications for 
any category have been deleted. The category scores from SCORTXT 
were, therefore, used in the prediction of Flexibility, In addition, 
since high correlations were found between some of the actuarial 
meas\ires and the Flexibility criteria, these also were used in the 
prediction. Those actuarial variables considered are the same vairi- 
ables isolated previously for Fluency. The results and prediction 
equations for Flexibility will also be presented and discussed in the 
following chapters. 

The equation for the prediction of Originality was based on both 
the Originality dictionary co\mts and the actuarial measures. In 
the choice of relevant actuarial variables for the prediction of 
Originality, the reasoning was the same as given for Flexibility. 
Again, the results and prediction equations will be presented and 
discussed in subsequent chapters. 

As mentioned earlier in this section, the automated scoring 
technique was based primarily on the scoring paradigm developed by 
Torrance, The inclusion of actuarial measures as prediction vari- 
ables did deviate from the model, but the use of these measiires in 
the prediction of the various scores did seem justified in light of 
the correlations between the predictors and the criteria. The 
validities of the multiple regression procedures are evaluated by 
means of standard cross-validation techniques. 

Summary 

Responses of 153 subjects to the Torrance Test of Creative 
Thinking , Verbal Form ^ were selected to serve as the data for the 
present research. To provide simulation criteria, the responses of 
each subject were rated by four trained human judges. High pooled 
reliabilities were found for the ratings given. The responses to 
the TTCT were keypunched on cards for subsequent evaluation by the 
computer, and dictionaries to be used in the automated scoring were 
constructed. The final strategies for the computerized scoring of 
the TTCT were also formulated. 
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Number of Question Marks 

Number of Commas 

Number of Periods 

Number of Words of Length One 

Number of Words of Length Two 

Number of Words of Length Three 

Number of VJords of Length Four 

Niimber of Words of Length Five 

Number of Words of Length Six 

Number of Words of Length Seven 

Number of Words of Length Nine 

Nijimber of Words of Length Ten 

Number of Words 

Number of Sentences 

Number of Paragraphs 

Average Word Length 

Average Sentence Length 

Average Paragraph Length 

Standard Deviation of Word Length 

Standard Deviation of Paragraph Length 

Third Moment of Word Length 

Fourth Moment of Word Length 

FIGURE 6 

ACTUARIAL VARIABLES niCLUDED IN PREDICTION EQUATIONS 
FOR FLUENCY, FLEXIBILITY, AlTD ORIGINALITY 
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CHAPTER IV 



RESULTS — ACTIVITIES 1, 2, AND 3 



Since modificatons in the procedures for analyzing the results 
of this investigation were introduced into the analyses of the last 
four activities the findings will be reported in two separate chapters . 
Thus the more sophisticated statistical treatments for Activities 4 
through 7 represent various approaches that tend to maximize pre- 
diction. The simpler and more utilitarian approach taken for Activi- 
ties 1 through 3 are reported in the present chapter and ar»e de- 
scribed separately for the reader who may wish to explore alter- 
native strategies in similar types of research settings. 



Establishing Prediction Equations 

As indicated in the previous chapter, the step-wise multiple 
regression technique was employed to maximize the prediction of sub- 
jects* scores for each activity of the TTCT. Since nine scores were 
predicted for each individual, that is, a Fluency, Flexibility, and 
Originality score for each of the three activities, nine separate 
analyses were performed yielding nine different prediction equations. 

The results of the step-wise multiple regression analysis for 
Activity 1, Fluency are presented in Table 9. Since Fluency is de- 
fined by Torrance as the number of appropriate responses given by the 
subject, it is not surprising that the variable "number of sentences," 
which is isomorphic to the number of responses, is fhe best predictor. 
It is surprising, however, that the multiple correlation coefficient 
is so hi^, .93, since no scheme for the determination of the appro- 
priateness of the responses was incorporated in the scoring procedure. 
It is interesting to note the changes in the multiple-R coefficients 
when the number of predictors is increased in a step-wise manner. At 
Step 10, that is, when ten predictors are included, the multiple-R 
coefficient is .96 while at Step 24 a multiple-R coefficient of .97 
is found, an increase of only .01. These data seem to indicate that 
the accuracy in prediction will not be significantly reduced by the 
elimination of some predictors. More will be said about this point 
in following chapters. 

Tables 10 and 11 summarize the results of the multiple regres- 
sion analyses for Fluency, Activities 2 and 3. Since the data are 
such that the number of paragraphs is equivalent to the number of 
sentences, the best predictor was again found to be the number of 
responses given. Although the sequence of variables entered into 
the regression equations is not the same for the three Fluency, 
criteria, the pattern of increase in the multiple-R coefficients 
is the same as previously noted. 
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The results of the regression analyses for the Flexibility 
criteria of Activities 1, 2, and 3 are represented in Tables 12, 13, 
and 14, respectively. Unexpectedly, the number of responses made 
by the subjects are found to be the best predictor of Flexibility 
for Activities 1 and 2. As was hypothesized, “category counts," 
which were determined from the dictionary constructed according to 
the guidelines set forth by Torrance, were also found to be of 
great value as predictors. For Activity 3, Flexibility, "mimber 
of words" was the first predictor extracted in the step-wise analysis. 
At first glance this result might appear to be discordant with the 
Flexibility result for Activities 1 and 2 ; however, since the inter- 
correlations among the variables are high (for these data, correla- 
tions of .83 were found to exist between "number of words" and both 
"number of sentences" and "number of paragraphs" ) , the "number of 
words" used can also be considered a measure of the number of re- 
sponses given. For Activity 3, it was expected that the variable 
"category counts" would be a better predictor of Flexibility than 
it was actually found to be. However, as with Fluency, the multiple- 
R coefficients were higher than had been anticipated. This will be 
discussed in more detciil in Chapter VI. 

Tables 15 , 16 , and 17 summarize the results of the Originality 
criteria of Activities 1, 2, and 3, respective!. "Categoiy coimts" 
was the best predictor of Originality for Activity 1, but for Activi- 
ties 2 and 3 it was not as effective as had been hypothesized. The 
multiple-R coefficients (.93 for Activity 1, .91 for Activity 2, and 
.83 for Activity 3) were again higher than anticipated. 



Cross-Validation of Prediction Equations 

To cross-validate these results the prediction equations derived 
from the analyses of the developmental data were applied to a new set 
of data for which N was 53. That is, for each of the nine criteria 
the established b-weights were used to predict the scores assigned by 
the hiHiian judges. These two sets of scores, the scores assigned by 
the h"uman judges and the scores assigned by the computer, were then 
compared using the Pearson Product-Moment Correlation. The correla- 
tions thereby obtained are presented in Table 18. The correlation 
coefficients corrected for attenuation caused by criterion \mrelia- 
bility are also reported in the table. Highly statistically signifi- 
cant' correlations were found for all of the nine predicted scores. 
Moreover, significant increases in the correlations were foimd when 
the correction for attenuation was made. The reasons for the impli- 
cations of these results will be discussed in Chapter VI. 
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STEP-WIDE MULTIPLE REGRESSION 
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TABLE 18 



VALIDITY OF I-IULTIPLE REGRESSION EQUATIONS 
IN CROSS-VALIDATION SAMPLE 



Criterion 



Uncorrected 

Correlation 

Coefficient 



Correlation Co- 
efficient Cor- 
rected for 
Attenuation 



Activity 1, Fluency 


. 89** 


. 90** 


Acticity 1, Flexibility 


.71** 


. 72** 


Activity 1, Originality 


. 74** 


. 83** 


Activity 2, Fluency 


. 88** 


o 

cn 

• 


Activity 2 , Flexibility 


.68** 


. 71** 


Activity 2, Originality 


.75** 


. 89** 


Activity 3, Fluency 


. 88** 


. 92** 


Activity 3, Flexibility 


. 56** 


.59** 


Activity 3, Originality 


. 72** 


. 99** 



** significant at .01 level 
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CHAPTER V 



RESULTS — ACTIVITIES 4 , 5 , 6 , AI^D 7 
(by John F. Greene) 



INTRODUCTION 

In this chapter the results of the computer simulation of the 
human rating practices for Activities 4, 5, 6, and 7 are reported. 
The chapter is logically partitioned into two sections; the multiple 
regression results for the developmental sample and the results ob- 
tained in the empirical cross-validation procedures. 



Establishing Prediction Equations 

Three sets of multiple regression analyses , each one correspond- 
ing to a particular model, are considered in this section. The first 
eleven tables (19 through 29) consist of the full model results for 
each dimension of Activities 4, 5, 6, and 7. In this model, all 
appropriate variables are allowed to enter the regression process. 

The restricted model is based on a subset of predictors statistically 
selected (i.e. by means of step-wise multiple regression) from the 
full set, and certain parts of the results are equivalent to the 
corresponding sections in the full model. The number of predictors, 
or the size of the partial set, is determined by the investigator and 
will be discussed at a later point in this chapter. Two forced models 
are considered in Tables 30 and 31. In this type of analysis, the 
researcher selects a partial set of predictors and forces them into 
the analysis before the remaining variables of the full set are 
allowed to enter. If this model is to differ from the full model, it 
also must be restricted. 

The same format is employed for the presentation of results for 
each of the models. The numbers corresponding to the variable entered 
are defined in Figure 7. Variables 1, 2, and 3 represent the criteria 
or dependent variables for Fluency, Flexibility and Originality, and, 
of coin?se , will not enter as predictor variables in the analyses . 
Variables 4 through 24 correspond to the actuarial variables. The 
category counts for Flexibility and Originality are variables 25 and 
26 respectively. 

The remaining columns of the tables are partitioned into two 
fields. The first field contains information relative to the final 
step of the particular model. Three sets of statistics are given. 
First, the resultant linear weight or b-weight is presented for each 
predictor variable. The intercept is also included. Then the 
standard error of the b-weighx is given, followed by a t-value. This 
t-value is the quotient of the b-weight and the standard error, and 
provides information pertaining to the value of the coefficient in 
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INTERCEPT: 8.36219 * Significant at .05 level 

** Significant at .01 level 
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INTERCEPT; -6.95818 s’j Significant at .05 level 

** Significant at .01 level 
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INTERCEPT: .08230 ;V Significant at .05 level 

** Significant at .01 level 
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INTERCEPT: -.57268 A Significant at .05 level 

** Significant at .01 level 



Variable Nuinber 


Variable Name 


1 


Fluency criterion 


2 


Flexibility criterion 


3 


Originality criterion 


4 


Number of commas ( , ) 


5 


Number of Periods ( . ) 


6 


Number of question marks (?) 


7 


Words of length 1 


8 


Words of length 2 


9 


Words of length 3 


10 


Words of length 4 


11 


Words of length 5 


12 


Words of length 6 


13 


Words of length 7 


14 


Words of length 8 


15 


Words of length 9 


16 


Words of length 10 


17 


Total number of words 


18 


Total number of sentences 


19 


Mean word length 


20 


Mean sentence length 


21 


Standard deviation of word length 


22 


Standard deviation of sentence length 


23 


Third moment of word length 


24 


Fourth moment of word length 


25 


Flexibility Dictionary 


26 


Originality Dictionary 



FIGURE 7 

VARIABLE NUMBERS AND NAMES 
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the population. The null hypothesis is that the beta-weight, or 
regression coefficient in the population, does not differ from zero. 

The second field also contains three sets of statistics for each 
step; namely, the multiple regression coefficient, the standard error 
of the estimate, and an F-value. The square of the multiple regres- 
sion coefficient yields the percentage of variance accounted for in 
the criterion, and is a good indicator of how well we can predict. 

The standard error of the estimate is equivalent to the standard 
deviation of the residuals , or differences between the predicted 
score and the observed score. The F-value indicates how well a par- 
ticular set of predictors is able to estimate the criterion. Because 
the most useful predictors are statistically selected first , the F- 
value is expected to decrease as predictors are added. Note, however, 
that at the same time the multiple correlation is increasing . 

Excellent results were obtained for Activity 4 and Activity 5. 

The full model multiple regression coefficients for Fluency, Flexi- 
bility, and Originality are .99, .91, and .84 in Activity 4 and .96, 
.85, and .87 in Activity 5. EqualJy encouraging results were noted 
for Activity 6, where the values of .97 and .80 were determined for 
Fluency and Originality respectively. Once again, the reader is 
reminded that there is no Flexibility score for Activity 6. More- 
over, only minimal differences were found between the full and re- 
stricted model results for these activities. In no case did this 
difference exceed 

Somewhat less oixcouraging results were generated for Activity 7 , 
where the multiple regression coefficients for the full model were 
.92, .84, and .73. These values dropped to .90, .83, and .70 in the 
restricted model. The results of the forced model for the Flexi- 
bility and Originality dimensions of this activity were .73 and .60. 
This forced model was only produced after the other two models gener- 
ated equations which did not predict the criterion scores in the 
cross-validation sample to the high degree desired, as will soon be 
shown. 



Cross-Validation of Prediction Equations 

The cross-validation correlations appear in Table 32. Because 
of criterion unreliability, an estimate corrected for attenuation is 
also presented. In an effort to facilitate comparisons, the results 
of all three types of models are included, as well as the correspond- 
ing multiple correlations obtained from the last step in each of the 
multiple regression tables. 

As can be seen by analyzing the results presented in the Table, 
the attenuated cross-validation correlation coefficients for Activi- 
ties 4, 5, and 6 are very high for both the full and restricted 
models. The Fluency range for these Activities is defined by .93 
and ,96 for the full model, while the values in the restricted model 
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did not vary from .96. The Flexibility coefficients were .81 and .89 
in the full model and .86 and .91 in the restricted model for Activi- 
ties 4 and 5. Originality varies from .79 to .92 in the full model 
as compared to .83 to .95 in the restricted model. Only for the 
dimension of Originality were considerable differences noted between 
the unattenutated and attenuated correlations. This was expected, 
however, since the pooled reliabilities of the judges for this 
dimension was lower than Fluency and Flexibility. 

In Activity 7 the full and restricted model attenuated cross- 
validation correlations were .87 and .85 for Fluency, .56 and .59 
for Flexibility, and .48 and .62 for Originality. Although these 
results indicated that the prediction equations are useful, they are 
somewhat lower than the results obtained for Activities 4 through 6. 
Thus , a forced model was generated for Flexibility and Originality . 
Correlations of .77 and .70 were obtained when the forced model pre- 
diction equations were compared to the observed scores. 



Summary 

The results of the extensive statistical analyses of this study 
were presented in Chapter IV and V. Included were several multiple 
correlation analyses, which were used to generate prediction equa- 
tions, and a cross-validation approach to the evaluation of the en- 
tire computerized scoring procedure. The discussion of these results 
along with their implications for future research will be presented 
in the next chapter. 
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CHAPTER VI 



DISCUSSION OF RESULTS 



This chapter will describe some of the findings , and the impli- 
cations of the findings from the attempt to predict scores on the 
Torrance Tests of Creative Thinking by means of computer simulation. 



Prediction Equations — Activities 1, 2, and 3 

In multivariate analysis, it is often pointless to elaborate a 
hypothesis for each predictor, or to explain how each variable met or 
failed to meet expectations. Moreover, since many of the predictors 
included in this research were "variables of opportunity," that is, 
variables which had been shown in previous research to be of some 
value in content analysis but not for the content analysis of re- 
sponses to creativity tests, strong relationships were not expected 
in all cases. In particular, the ten simple word length statistics 
(number of words of length one, two, three, etc.) were expected to 
aid in the prediction of creativity scores, but they were not ex- 
pected to be among the best predictors. As is evident from the data 
presented in Chapters IV and V, however, in many cases these simple 
counts were extracted early in the step-wise multiple regression 
aiialyses. Although the variables "number of sentences," "number of 
paragraphs," and "number of words" were hypothesized and observed to 
be among the best predictors of Fluency, an unexpected finding was 
that the same variables were also found to be among the best pre- 
dictors of Flexibility and Originality. 

For the prediction of Flexibility and Originality, it was hy- 
pothesized that the variable "category counts" would be the most im- 
portant predictor, since the counts were derived from the dictionaries 
in accordance with Torrance’s scoring norms. However, this was true 
only for the prediction of the Activity 1, Originality scores. For 
the prediction of the Flexibility scores of Activities 1 and 2, 
"category counts" was the sixth best predictor; for Activity 3, 
Flexibility, it was the twelfth best predictor; and for Activity 3, 
Originality the variable was not entered until the 24th step of the 
regression analysis. That "category counts" was not more important 
than had been hypothesized could have been the result of insufficient 
or incorrect entries in the dictionaries or categories. But if the 
correlations between "category counts" and the criteria were indeed 
high, the value of "category counts" as a predictor would have been 
influenced by; a) high correlations between other variables and the 
criteria, and b) high correlations between these other variables 
and "category counts." If this was the case, much of the variance 
in the criterion which could be accoiinted for by "category coimts" 
would be extracted by the other variables. To investigate this pos- 
sibility, the simple correlations among the predictors themselves 
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and the correlations between the '’’category counts’* and the various 
criteria were examined. Table 33 reports the Pearson Product-Moment 
Correlation Coefficients for the criteria and the various "category 
counts.” Due to the voluminous nature of the intercorrelations 
among the variables, no such tables have been provided here. 

As indicated in Table 33, the correlations of the "category 
counts" with the criteria are generally very high, ranging from .54 
to .81, It should be noted that the size of the correlations with 
the criteria does not necessarily indicate the importance of "cate- 
gory counts" as a predictor. For example, for Acticity 1, Flexi- 
bility, "category counts," which correlated .74 with its criterion, 
was introduced third in its step-wise regression analysis. However, 
although the correlation was .67, the same variable was introduced 
in the second step when Flexibility was being predicted for Activity 
3. ^-Jhen the intercorrelations among the predictors are considered, 
the reason for this is apparent. In all cases where "category 
counts" was not the best predictor, a high correlation was found be- 
tween "category coxmts" and one of the best predictors. For example, 
when the Flexibility score for Activity 3 was sought, the best pre- 
dictor, "number of words," correlated .78 with ’‘category counts.'* 

It seems , then , that the dictionaries or categories were at least 
adequate, but that their importance was restricted in the various 
prediction equations by the importance of other predictors. 

As indicated, the number of responses, as measured by the vari- 
ables "number of sentences," "number of paragraphs," and "number of 
words," were continually among the best predictors of each of the 
three dimensions of creativity. Since these three variables are all 
measures of verbal Fluency, as Fluency is understood in a literary 
sense rather than as used by Torrance, it is possible that there 
exists one underlying dimension for the TTCT , rather than the three 
dim.ensions , Fluency, Flexibility, and Originality. Other evidence 
for this interpretation has been reported in the literature. Cici- 
relli (1964), for example, reported intercorrelations of .79 for 
Fluency and Flexibility, .80 for Fluency and Originality, and .74 
for Originality and Flexibility. Long and Henderson (1964) have 
found average intercorrelations for samples of children in grades 
two through seven of .68 for Fluency and Flexibility, .60 for Fluency 
and Originality, and ,80 for Flexibility and Originality. The inter- 
pretation given these results was that subjects high or low in 
Fluency, as measured by the TTCT, would likewise be high or low in 
Flexibility and Originality. Or again, highly creative people would 
simultaneously be Fluent, Flexible, and Original while those of low 
creativity v/ould simultaneously be less Fluent, Flexible, and Origi- 
nal. However, the results reported here suggest that persons found 
to be creative by the TTCT are highly Fluent and that Fluency accounts 
for their high Flexibility and Originality scores . Likewise , those 
persons who are not Fluent, that is, do not give many responses, also 
will not be found to be Flexible and Original. On common sense 
grounds , it would appear that one improves his ch-inces of increasing 
his Flexibility score simply by producing a greater number of 
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TABLE 33 

SIMPLE CORRELATIONS BETWEEN ’’CATEGORY COUNTS" 

AND CRITERIA 



Criteria 


r 


Step Variable Entered 
into 

Regression Equation 


Activity 1, Flexibility 


^ 74 ^^* 


3 


Activity 1, Originality 


. 81** 


1 


Activity 2, Flexibility 


.67** 


2 


Activity 2, Originality 


. 54** 


6 


Activity 3, Flexibility 


.60** 


12 


Activity 3, Originality 


.59** 


24 



** Significant at .05 level 
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responses. And, of course, chances of "scoring a hit" in Originality 
are likely to increase when one generates a greater number of re- 
sponses. ^^hether this relationship does, in fact, exist must be de- 
termined by further research. At least two studies dealing with the 
dimensionality issue and growing out of the research were reported 
by Paulus (1970) and Ren 2 ;ulli (1970) and it is anticipated that 
further research in the area of creativity must take account of the 
important problems related to dimensionality. 



The Overall Predictive Value of the Scoring 
Strategy for Activities 1, 2, and 3 

From the standpoint of overall simulation, the multiple correla- 
tions obtained for the prediction of the pooled human judgments were 
the primary goal of the analyses. For each of the nine prediction 
equations derived, the results obtained were rather startling. High 
multiple correlations were ejqjected for the prediction the Fluency 
dimension of creativity, but the multiple-R’s of .97, .93, and .95 
obtained for Activities 1, 2, and 3 respectively were higher than 
anticipated. It was hypothesized that the prediction of the Flexi- 
bility dimension of creativity would be a harder task than the pre- 
diction of Fluency. The multiple-R*s of .91, .87, and .85 for the 
Flexibility scores of Activities 1, 2, and 3 substantiate this hy- 
pothesis. However, the prediction of Flexibility is still very good, 
since much of the variance in the Flexibility criteria can be accounted 
for by the set of predictors employed. 

Since the assessment of Originality has been found to be a dif- 
ficult task for humans, even more difficult than the assessment of 
Flexibility, it was expected that the lowest multiple-R’s would be 
encountered in the computerized prediction of this dimension of cre- 
ativity. It is not surprising, then, that the lowest multiple corre- 
lation coefficient, .83, was found for Activity 2, Originality. Hov/- 
ever, in the light of both the h3q50thesis and this finding, the 
multiple-R^s of .93 and .91 for Activity 1, Originality and Activity 
3, Originality were unexpected. That the computer can judge Origi- 
nality (as defined by Torrance) to the degree observed is indeed an 
important finding. 

It is well known, however, that the accuracy found in the deri- 
vation of prediction equations should not be expected if new re- 
sponses to the TTCT were taken and the discovered b-weightings were 
applied to them to predict their human ratings. For any set of 
scores , or any set of resultant correlations , contains not only true 
variance associated with the variables, but also a certain amount of 
error variance (probably unique to the particular subjects concerned) 
v;hich will not ordinarily be found with a new set of human subjects, 
or responses (Page and Paulus, 1968, p. 53). The true variance gives 
us information which will be subsequently useful. But the error 
variance is also capitalized upon by the analysis, and a certain 
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portion of the multiple-regression coefficients, and of the contri- 
buting b-w^ights, will spuriously contribute, but will not stand up 
in a replication. 

When the findings a 3 ?e cross -validated, then, the resulting pre- 
diction will not always correlate as highly as one might hope. The 
statistical loss is commonly spoken of as "shrinkage" and has been 
widely treated in the literature (e.g., McNemar, 1962). As one would 
suppose, the larger the number of subjects, the more reliable the 
multiple-R will be; but the larger the number of predictors (given 
the same number of siabjects), the less reliable the multiple-R will 
be. 



The multiple-R*s for the Fluency dimension of creativity cross- 
validated very well, but sizable shrinkage was found for the multiple- 
R*s of the Flexibility and Originality dimensions. However, when 
adjustments were made for the lack of perfect reliability in the 
criteria (i.e., the so-called "correction for attenuaxion") , signifi- 
cant increases in the correlations were found for both of these di- 
mensions. Since these correlations were influenced by both the sample 
size and the number of predictors, as discussed above, the results are 
not unexplainable. Also, as noted in the previous chapters, it ap- 
pears that accuracy in the prediction of any of the dimensions of cre- 
ativity would not be significantly reduced by the elimination of some 
of the predictors. Moreover, if fewer pir^dictors were used, the cor- 
relations found in cross-validating the results would have been higher. 
The reader will note that these considerations were brought to bear 
on the analyses of Activities 4 through 7. 



Prediction Evaluations — Activities 4, 5, 6, and 7 

The results of the multiple correlation analyses for Activities 
4 through 7 presented in Chapter V, must be considered most encourag- 
ing. The full model coefficients for Fluency range from .92 to .99. 
The range for Flexibility is .84 to ,91. And the multiple-R *s for 
Originality are .84, .87, .80 and .73 for Activities 4-7 respectively. 
Although these results must be validated in the cross-validation 
sample, they represent very high potential prediction power. The 
percentage of variance acco\inted for varies from 53 to 98, and each 
multiple correlation coefficient is significant beyond the .01 level. 

The restricted model results parallel those of the full model. 

In all but one equation, the multiple correlation coefficient dropped 
by less than one-hundreth of a point. The greatest loss in potential 
predictability was realized in Activity 7, Originality, where a .03 
difference was noted. In these restricted analyses, no more than 
half of the original set of predictors was utilized, with four in- 
stances of using a few as five or six predictors. 

Greater losses in the multiple-R coefficient were detected for 
the two forced models. Activity 7, Flexibility dropped from .84 to 
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.73, and a .13 decline from .73 was noted for the Activity 7, Origi- 
nality forced model. These models were generated, however, because 
of low cross-validations in their respective full and restricted 
models, as will be shown below. Thus, while lower multiple correla- 
tions were obtained, higher cross-validation correlations are expected. 
The procedure for selecting which predictors were to be forced will 
be explained in the next section, but one advantage of the particular 
forced models considered is that they employ only three and four pre- 
dictors . 

All of the multiple correlation coefficients reported are high 
and significant beyond the .01 level. Before speculating on the re- 
lative value of these results, however, the validity of the predic- 
tion equations wi3J. be estimated in the next section. 



Cross-Validaton of Prediction Equations 
(Activities 4 through 7) 

The attenuated cross-validation correlations appear in Table 32. 
Only the first nine equations will be immediately discussed, followed 
by the equations relative to the Flexibility and Originality scores 
for Activity 7. The cross-validation correlations for the first nine 
equations of the full and restricted models range from .79 to .96. 

Each is significant beyond the .01 level. The shrinkage, or differ- 
ence between the multiple correlation and the cross-validation cor- 
relation, is minimal, never exceeding .10. 

Considerable shrinkage was noted in both the full and restricted 
models for the Flexibility and Originality dimensions of Activity 7. 
The attenuated cross-validation correlations of .56 and .48 in the 
full model and .59 and .62 in the restricted moo.el certainly are at 
least of moderate value in view of the present state of the art; how- 
ever, in comparison with previously stated results, they are somewhat 
disappointing. Hence, additional analyses were conducted, and a third 
model, the forced model, was generated. 

As defined earlier, a forced model is one in which the re- 
searcher selects a potential set of predictors and forces them into 
the analysis before the remaining variables of the full set are 
allowed to enter. If the forced model is to differ from the full 
model, it must also be restricted. 

Before considering the process of selecting the forced predictor 
variables, the rationale for using this type of model will be dis- 
cussed. In multiple regression analysis, only the full model, after 
cross-validation, reflects ones ability to predict in what ever field 
is being studied. The results of restricted and forced models repre- 
sent goals to be attained in future research, and each of those models 
must be applied to a new sample if their validity is to be evaluated. 
Thus, when working with models other than the full model, the 
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researcher need not necessarily restrict his efforts to only the 
developmental sample. He must realize, however, that the non cross- 
validated results are tentative and contingent upon the assumptions , 
however implicit, utilized in his method of generating the restricted 
or forced model. \. 

In this phase of the research, the forced pi^dictor variables 
were selected by analyzing the correlations between the predictors 
and the criterion in both the developmental and the cross-validation 
sample. Only those predictors whose correlation with the criterion 
did not vary appreciably from one sample to the other were selected. 

The attenuated "cross-validation results" (Lord-Nicholson Formula) 
for the forced model in this study were very encouraging. Correla- 
tions of .77 and .70 for the Flexibility and Originality equations 
in Activity 7 were established. Of course, these results are tenta- 
tive, and must be empirically tested in a new sample. 

As previously indicated, in multivariate analysis it is often 
pointless to elaborate a hypothesis for each predictor, or to explain 
how each predictor met or failed to meet expectations. Since several 
of the predictors included in this study are "variables of opportunity, 
that is , variables which have been shown to be of value in earlier 
phases of the investigation strong relationships were not expected in 
all cases. Again, however, as Stone (1966) emphasized, category con- 
struction is considered the most crucial stage in content analysis. 
Thus, the correlations between the Flexibility and Originality dic- 
tionaries and the criterion as well as the step in which these cate- 
gory coxint variables entered the regression analyses for each activity 
will be considered. These results are given in Table 34. The Cor- 
relations of the dictionaries with the criteria are generally very 
high, ranging from .35 to .79 and significant beyond the .01 level. 
Furthermore, these predictors entered the regression analyses at 
extremely early stages. Thus, their usefulness in the prediction 
process is established. 



73 



85 



TABLE 34 



CORRELATION OF DICTIONARY WITH CRITERION, 
INCLUDING STEP ENTERED 





Activity 


Dimension 


r 


step Entered 



4 


Flexibility 


.79** 


1 


4 


Originality 


.64** 


7 


5 


Flexibility 




1 


5 


Originality 


.61** 


2 


6 


Originality 


. 57** 


2 


7 


Flexibility 


.67** 


1 


7 


Originality 


. 35** 


3 



** Significant at .01 level 
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